Details of the input data

First group of samples (to be referred to as control in the rest of the report)

Sample Names:
GSM2858677
GSM2858679
GSM2858681
GSM2858684
GSM2858685
GSM2858687
GSM2858689
GSM2858691
GSM2858693
GSM2858695
GSM2858696
GSM2858698
GSM2858700
GSM2858703
GSM2858708
GSM2858709
GSM2858710
GSM2858714
GSM2858715
GSM2858717
GSM2858720
GSM2858721
GSM2858723
GSM2858724
GSM2858725
GSM2858733
GSM2858737
GSM2858738
GSM2858740
GSM2858741
GSM2858745
GSM2858747
GSM2858750
GSM2858751
GSM2858753
GSM2858755
GSM2858756
GSM2858758
GSM2858759
GSM2858763
GSM2858764
GSM2858768
GSM2858769
GSM2858771
GSM2858774
GSM2858776
GSM2858779
GSM2858781
GSM2858782
GSM2858783
GSM2858786
GSM2858787

Second group of samples (to be referred to as treatment in the rest of the report)

Sample Names:
GSM2858680
GSM2858683
GSM2858697
GSM2858702
GSM2858704
GSM2858706
GSM2858712
GSM2858719
GSM2858722
GSM2858728
GSM2858730
GSM2858731
GSM2858732
GSM2858734
GSM2858735
GSM2858736
GSM2858743
GSM2858744
GSM2858752
GSM2858754
GSM2858757
GSM2858762
GSM2858765
GSM2858770
GSM2858773
GSM2858775
GSM2858777
GSM2858778
GSM2858784
GSM2858785
GSM2858788
GSM2858790
GSM2858791
GSM2858793
GSM2858794
GSM2858795

Note: A positive log fold change shows higher expression in the treatment group; a negative log fold change represents higher expression in the control group.

Data quality control (QC)

Correlation between samples:

Here we show scatterplots comparing expression levels for all genes between the different samples, for i) all controls, ii) all treatment samples and iii) for all samples together.

These plots will only be produced when the total number of samples to compare within a group is less than or equal to 10.

Heatmap and clustering showing correlation between replicates

BROWN: higher correlation; YELLOW: lower

Principal Component Analysis

This is a PCA plot of the count values following rlog normalization from the DESeq2 package:

## Warning: ggrepel: 42 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

The samples are shown in the 2D plane and distributed by their first two principal components. This type of plot is useful for visualizing the overall effect of experimental covariates and batch effects. It is also useful for identifying outlier samples. Control and treatment samples respectively may cluster together.

Visualizing normalization results

These boxplots show the distributions of count data before and after normalization (shown for normalization method DESeq2):

Representation of cpm unfiltered data:

Before normalization:

After normalization:

Gene counts variance distribution

Variance of gene counts across samples are represented. Genes with lower variance than selected threshold (dashed grey line) were filtered out.

Samples differences by all counts normalized:

All counts were normalizated by DESeq2 algorithm. This count were scaled by log10 and plotted in a heatmap.

Percentages of reads per sample mapping to the most highly expressed genes

GSM2858677 GSM2858679 GSM2858681 GSM2858684 GSM2858685 GSM2858687 GSM2858689 GSM2858691 GSM2858693 GSM2858695 GSM2858696 GSM2858698 GSM2858700 GSM2858703 GSM2858708 GSM2858709 GSM2858710 GSM2858714 GSM2858715 GSM2858717 GSM2858720 GSM2858721 GSM2858723 GSM2858724 GSM2858725 GSM2858733 GSM2858737 GSM2858738 GSM2858740 GSM2858741 GSM2858745 GSM2858747 GSM2858750 GSM2858751 GSM2858753 GSM2858755 GSM2858756 GSM2858758 GSM2858759 GSM2858763 GSM2858764 GSM2858768 GSM2858769 GSM2858771 GSM2858774 GSM2858776 GSM2858779 GSM2858781 GSM2858782 GSM2858783 GSM2858786 GSM2858787 GSM2858680 GSM2858683 GSM2858697 GSM2858702 GSM2858704 GSM2858706 GSM2858712 GSM2858719 GSM2858722 GSM2858728 GSM2858730 GSM2858731 GSM2858732 GSM2858734 GSM2858735 GSM2858736 GSM2858743 GSM2858744 GSM2858752 GSM2858754 GSM2858757 GSM2858762 GSM2858765 GSM2858770 GSM2858773 GSM2858775 GSM2858777 GSM2858778 GSM2858784 GSM2858785 GSM2858788 GSM2858790 GSM2858791 GSM2858793 GSM2858794 GSM2858795
4535 0.014 0.014 0.014 0.015 0.015 0.014 0.014 0.015 0.015 0.014 0.014 0.014 0.015 0.015 0.015 0.015 0.014 0.014 0.015 0.014 0.015 0.014 0.015 0.014 0.015 0.015 0.015 0.015 0.015 0.015 0.014 0.015 0.015 0.014 0.014 0.014 0.014 0.014 0.014 0.015 0.015 0.015 0.015 0.014 0.015 0.014 0.015 0.014 0.014 0.015 0.015 0.015 0.014 0.014 0.014 0.014 0.015 0.014 0.014 0.015 0.015 0.015 0.014 0.015 0.014 0.015 0.015 0.014 0.015 0.014 0.015 0.014 0.014 0.014 0.015 0.014 0.014 0.014 0.014 0.015 0.014 0.015 0.015 0.014 0.015 0.014 0.015 0.014
4536 0.014 0.014 0.014 0.015 0.015 0.014 0.014 0.015 0.015 0.014 0.014 0.014 0.015 0.015 0.015 0.015 0.014 0.014 0.015 0.014 0.015 0.014 0.015 0.014 0.015 0.015 0.015 0.015 0.015 0.015 0.014 0.015 0.015 0.014 0.014 0.014 0.014 0.014 0.014 0.015 0.015 0.015 0.015 0.014 0.015 0.014 0.015 0.014 0.014 0.015 0.015 0.015 0.014 0.014 0.014 0.014 0.015 0.014 0.014 0.015 0.015 0.015 0.014 0.015 0.014 0.015 0.015 0.014 0.015 0.014 0.015 0.014 0.014 0.014 0.015 0.014 0.014 0.014 0.014 0.015 0.014 0.015 0.015 0.014 0.015 0.014 0.015 0.014
4537 0.014 0.014 0.014 0.015 0.015 0.014 0.014 0.015 0.015 0.014 0.014 0.014 0.015 0.015 0.015 0.015 0.014 0.014 0.015 0.014 0.015 0.014 0.015 0.014 0.015 0.015 0.015 0.015 0.015 0.015 0.014 0.015 0.015 0.014 0.014 0.014 0.014 0.014 0.014 0.015 0.015 0.015 0.015 0.014 0.015 0.014 0.015 0.014 0.014 0.015 0.015 0.015 0.014 0.014 0.014 0.014 0.015 0.014 0.014 0.015 0.015 0.015 0.014 0.015 0.014 0.015 0.015 0.014 0.015 0.014 0.015 0.014 0.014 0.014 0.015 0.014 0.014 0.014 0.014 0.015 0.014 0.015 0.015 0.014 0.015 0.014 0.015 0.014
4540 0.014 0.014 0.014 0.015 0.015 0.014 0.014 0.015 0.015 0.014 0.014 0.014 0.015 0.015 0.015 0.015 0.014 0.014 0.015 0.014 0.015 0.014 0.015 0.014 0.015 0.015 0.015 0.015 0.015 0.015 0.014 0.015 0.015 0.014 0.014 0.014 0.014 0.014 0.014 0.015 0.015 0.015 0.015 0.014 0.015 0.014 0.015 0.014 0.014 0.015 0.015 0.015 0.014 0.014 0.014 0.014 0.015 0.014 0.014 0.015 0.015 0.015 0.014 0.015 0.014 0.015 0.015 0.014 0.015 0.014 0.015 0.014 0.014 0.014 0.015 0.014 0.014 0.014 0.014 0.015 0.014 0.015 0.015 0.014 0.015 0.014 0.015 0.014
4513 0.014 0.014 0.014 0.015 0.015 0.014 0.014 0.015 0.015 0.014 0.014 0.014 0.015 0.015 0.015 0.015 0.014 0.014 0.015 0.014 0.015 0.014 0.015 0.014 0.015 0.015 0.015 0.015 0.015 0.015 0.014 0.015 0.015 0.014 0.014 0.014 0.014 0.014 0.014 0.015 0.015 0.015 0.015 0.014 0.015 0.014 0.015 0.014 0.014 0.015 0.015 0.015 0.014 0.014 0.014 0.014 0.015 0.014 0.014 0.015 0.015 0.015 0.014 0.015 0.014 0.015 0.015 0.014 0.015 0.014 0.015 0.014 0.014 0.014 0.015 0.014 0.014 0.014 0.014 0.015 0.014 0.015 0.015 0.014 0.015 0.014 0.015 0.014

DEgenes Hunter results

Gene classification by DEgenes Hunter

DEgenes Hunter uses multiple DE detection packages to analyse all genes in the input count table and labels them accordingly:

  • Filtered out: Genes discarded during the filtering process as showing no or very low expression.
  • Prevalent DEG: Genes considered as differentially expressed (DE) by at least 4 packages, as specified by the minpack_common argument.
  • Possible DEG: Genes considered DE by at least one of the DE detection packages.
  • Not DEG: Genes not considered DE in any package.

This barplot shows the total number of genes passing each stage of analysis - from the total number of genes in the input table of counts, to the genes surviving the expression filter, to the genes detected as DE by one package, to the genes detected by at least 4 packages.

Package DEG detection stats

This is the Venn Diagram of all possible DE genes (DEGs) according to at least on of the DE detection packages employed:

Plot showing variability between different DEG detection methods in terms of logFC calculation

This graph shows logFC calculated (y-axis) for each package (points) and gene (x-axis). Only genes with variability over 0.01 will be plotted. This representation allows to user to observe the behaviour of each DE package and see if one of them has atypical results.

If there are no genes showing sufficient variance in estimated logFC accross methods, no plot will be produced and a warning message will be given.

FDR gene-wise benchmarking

Benchmark of false positive calling:

Boxplot of FDR values among all genes with an FDR <= 0.05 in at least one DE detection package

## No Prevalent DEGs found, Bar charts of FDR values for prevalent genes cannot be shown

The complete results of the DEgenes Hunter differential expression analysis can be found in the “hunter_results_table.txt” file in the Common_results folder

DE detection package specific results

Various plots specific to each package are shown below:

DESeq2 normalization effects:

This plot compares the effective library size with raw library size

The effective library size is the factor used by DESeq2 normalizatioin algorithm for eahc sample. The effective library size must be dependent of raw library size.

DESeq2 MA plot:

This is the MA plot from DESeq2 package:

In DESeq2, the MA-plot (log ratio versus abundance) shows the log2 fold changes are attributable to a given variable over the mean of normalized counts. Points will be colored red if the adjusted Pvalue is less than 0.1. Points which fall out of the window are plotted as open triangles pointing either up or down.

A table containing the DESeq2 DEGs is provided: in Results_DESeq2/DEgenes_DESEq2.txt

A table containing the DESeq2 normalized counts is provided in Results_DESeq2/Normalized_counts_DESEq2.txt

Differences between samples by PREVALENT DEGs normalized counts:

Counts of prevalent DEGs were normalizated by DESeq2 algorithm. This count were scaled by log10 and plotted in a heatmap.

## Lower than 2 prevalent differential expression were found

edgeR MA plot

This is the MA plot from edgeR package:

Differential gene expression data can be visualized as MA-plots (log ratio versus abundance) where each dot represents a gene. The differentially expressed genes are colored red and the non-differentially expressed ones are colored black.

A table containing the edgeR DEGs is provided in Results_edgeR/DEgenes_edgeR.txt

A table containing the edgeR normalized counts is provided in Results_edgeR/Normalized_counts_edgeR.txt

limma Volcano plot

Volcano plot of log2-fold change versus -log10 of adjusted p-values for all genes according to the analysis with limma:

A table containing the limma DEGs is provided in Results_limma/DEgenes_limma.txtA table containing the limma normalized counts is provided in Results_limma/Normalized_counts_limma.txt

NOISeq Expressionplot

This is the summary plot for (M,D) values (black) and the differentially expressed genes (red) from the NOISeq package (Image extracted from {‘ExpressionPlot.pdf’} file):

A table containing the NOISeq DEGs is provided in Results_NOISeq/DEgenes_NOISeq.txt.

A table containing the NOISeq normalized counts is provided in Results_NOISeq/Normalized_counts_NOISeq.txt

WGCNA Results

WGCNA was run to look for modules (clusters) of coexpressed genes. These modules were then compared with the sample factors to look for correlation. If no sample factors were specified, this comparison was performed with treatment/control labels.

The following graphic shows the power value chosen for building clusters. The power is chosen by looking at the characteristics of the network produced.

In total there were 11 clusters. The following plot shows the number of genes per cluster:

Module Membership distribution

Cluster assignment vs lower module membership (MM)

This plot shows, for each gene, the cluster ID ascertained by WGCNA, vs. the cluster whose eigen gene has the highest correlation (module membership/MM).

Cluster vs. factors correlation

The following plots show the correlation between the different modules and specified factors. This is done using eigengenes, which can be broadly thought of as the average expression pattern for the genes in a given cluster. MEn refers to the eigengene for cluster n.

This plot shows the correlation between clusters (eigen genes) and factors directly.

WGCNA Eigen values clustering

WGCNA dendogram showing distances between these eigengenes along with the factors. Distances has been calculated using signed correlation so more near elements, more positive correlation between elements.

Eigen values clustering (Absolute correlation)

WGCNA like dendogram showing distances between these eigengenes along with the factors. Distances has been calculated using absolute correlation so more near elements, more absolute correlation between elements.

Correlation network between modules and factors

This plot shows modules (black) and factors (green) as nodes. Correlations coefficients over 0.8 (red) and under -0.8 (blue) are represented as edges

Correlation between all clusters and factors

Detailed package results comparation

This is an advanced section in order to compare the output of the packages used to perform data analysis. The data shown here does not necessarilly have any biological implication.

P-value Distributions

Distributions of p-values, unadjusted and adjusted for multiple testing (FDR)

FDR Correlations

Correlations of adjusted p-values, adjusted for multiple testing (FDR) and for log Fold Change.

Values of options used to run DEGenesHunter

First column contains the option names; second column contains the given values for each option in this run.

opt
minpack_common 4
p_val_cutoff 0.05
lfc 1
modules DELNW
active_modules 4